An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing

نویسندگان

  • David P. Hughes
  • Marcel Salathé
چکیده

Human society needs to increase food production by an estimated 70% by 2050 to feed an expected population size that is predicted to be over 9 billion people. Currently infectious diseases reduce the potential yield by an average of 40% with many farmers in the developing world experiencing yield losses as high as 100%. Infectious diseases of crops are not new and historic examples such as the Irish Potato Famine of 1845-49 demonstrate this. But what is new is the widespread distribution of smartphones among crop growers around the world with an expected 5 billion smartphones by 2020. This offers the potential of turning the smartphone into a valuable tool for diverse communities growing food. One potential application is the development of mobile disease diagnostics through machine learning and crowdsourcing. Computer vision and machine learning have shown their potential to automatically classify images. To do this for plant diseases requires a training set that facilitates the development of the algorithms. Here we announce the release of >50,000 expertly curated images on healthy and infected leaves of crops plants through the existing platform www.PlantVillage.org. We describe both the data and the platform. These data are the beginning of an on-going, crowdsourcing effort to enable computer vision approaches to help solve the problem of yield losses in crop plants due to infectious diseases. Keywords: plant pathology, food security, machine learning, crowdsourcing The origins and continued evolution of agriculture in the face of infectious diseases and pests Perhaps the greatest technological advance that humans have ever made has been the domestication of plants during the agricultural revolution 8-12,000 years ago at multiple sites around the world (Diamond 2002).These events literally created civilizations through the steady, predictable supply of calories that could be obtained with lower and lower amounts of energy expended through human work. The enormous increase in human population from around 1 billion in the early 1800s to our current situation of over 7.2 billion has been made possible by an efficient and productive agricultural basis (Borlaug 2000). However, this steady supply of calories is currently threatened, as many of the advances resulting from the Green Revolution of the 1950s are failing due to infectious diseases and pests spread by globalization and compounded by climate change (Bourne 2015). Diseases of plants are not new of course. Many histories have recorded the enormous impact that infectious diseases have had on growing food crops (Ainsworth 1981). The iconic example is the Irish Potato famine of 1845-49 where an overdependence on a single crop with little genetic diversity set the stage for a devastating decline in yield that resulted in 1.2 million deaths out of a population of 9 million (Woodham-Smith 1991). At the time, plant diseases were poorly studied and the Irish Potato famine is widely identified as the event that began the scientific discipline of plant disease (Carefoot and Sprott 1967). It also marks the formal system of extension where science based knowledge is provided to farmers by state employed extension workers (Jones and Garforth 2005). In the intervening 170 years, plant pathology has grown to become a robust discipline offering crop growers multiple solutions from chemical control to genetic engineering to integrated pest management. At the same time, formalized extension systems have likewise developed to provide knowledge to growers. But the fact that the global food supply is annually reduced by an average of 40% (Oerke 2006) demonstrates that our collective battle against diseases and pests of crop plants is not won. In fact, the emergence and spread of novel and highly virulent crop diseases like the stem rust UG99 that attacks wheat, black pod in Cocoa and viral infections of Cassava suggest that the situation may in fact be worsening. This is troubling at a time when the UN FAO recommends we must in fact increase the food supply by 70% to feed the future population. The effect of modern agriculture is not just its ability to feed large populations in the face of disease threats. Given the long history we humans have had in growing food, it is not surprising that how we grow food shapes the social, psychological, and physical development of individuals and communities. In a large study of communities of rice and wheat farmers in China, for example, researchers found rice farming changed people’s psychology, resulting in more interdependent communities (Talhelm et al. 2014). Recently, two studies recorded how the transition from hunter-gatherers to farmers reduced the strength of our bones (Chirchir et al. 2015, Ryan and Shaw 2015) and teeth (Pinhasi et al. 2015). And at the societal level it is a truism that no country has achieved economic independence and a strong manufacturing sector without first developing a strong agricultural sector. This was (regrettably) demonstrated in a number of African countries in the 1950s which, post-independence, abandoned a system of coordinated agricultural knowledge development and dissemination in favor of industrial growth (Eicher 1999). The result was a weakened agricultural sector that still exists today which 1 http://www.fao.org/fileadmin/templates/wsfs/docs/expert_paper/How_to_Feed_the_World_in_2050.pdf necessitates the efforts by the Alliance for a Green Revolution in Africa (AGRA) (Sanchez 2015). Therefore, an examination of humans at the psychological, physical and societal levels demonstrates that, as an agent of change, agriculture is one of the most potent. Agriculture is culture. And like all human culture, it is constantly evolving. For example, in the USA, the way people have farmed -and what proportion of the population actually farmshas shifted considerably. These changes reflect increased urbanization and greater efficiency at the farm level via the incorporation of new technologies from fertilizers and pesticides to better farm machinery, to more recent advances such as precision agriculture and Genetically Modified Crops. In America in 2015, few people are fully engaged as farmers; less than 1% of the entire population according to the United States Department of Agriculture (Woteki 2012). It would be easy to think that food growing is not that relevant to most Americans or other members of countries in the industrialized west. But of course that is not true. A cursory glance at the media shows an increasingly high level of concern about food production, and many people express unease at the dominance of Big Ag (e.g. Monsanto), the increased focus on growing monocultures (e.g. corn) for both meat production and biofuels, and of course the public concern over GMOs (Bourne 2015). Perhaps reflecting this change is the growing movement called the New American Farmer (Bartling 2012). Since the 1980s we have witnessed an increase in smallholder farmers that seek to promote diverse practices and crops. The essence of such movements were captured in Michael Pollan’s book, The omnivore's dilemma: a natural history of four meal (Pollan 2006). Recognizing this, public health institutions are increasingly encouraging consumers to grow their own food, as both the process of growing food (exercise through gardening) and the yield (fresh fruit and vegetables) contribute to beneficial public health outcomes. Indeed, gardening, and urban gardening in particular, has become increasingly popular in the past two decades, reversing a long trend of consumers becoming less and less involved in growing their food. Globally, Google Trends captures this and reports a steady increase of interest in "urban gardening" . Whereas many communities in the developed world countries are choosing to grow foods at small scales, in the developed world such small-scale farming is the norm. This is an imposed necessity rather than a choice. In many countries (e.g. those in SubSaharan Africa) as much as 80% of the population are farmers with single families growing diverse crops on small (2-5 hectare) plots with minimal mechanical or chemical (fertilizer, pesticides) inputs (Sanchez 2015). At this scale the relative impact of yield gaps (the gap between potential and actual yield) is very high (Collier and Dercon 2009, Foley et al. 2011). A consistent wedge reducing our ability to close this gap are infectious diseases and pests. Since most subsistence agriculture today occurs around the tropics and since the biodiversity of all infectious diseases (of humans, animals and plants) is higher in the tropics then the pressure of diseases are greatest in these areas. It is commonplace for smallholder farmers to routinely lose 80-100% of a given crop to pests and diseases (Oerke 2006). Taken together then we can see that agriculture is an ancient technology of great relevance to both human society and human psychology and that it is currently practiced around the world in diverse modes that are under constant cultural evolutionary change. In a world that increasingly hot, flat and crowded the importance of growing enough food 2 https://www.google.ch/trends/explore#q=Urban%20gardening cannot be overstated. The difficulties are many but largely revolve around the need to close the yield gap by reducing losses due to diseases, pests and poor management practices. In the 170 years since the Irish Potato famine we have learned a considerable amount about the science of reducing the yield gap. But our current situation of diverse food growing arenas from community gardens to mega farms to smallholder farms has meant that finding relevant knowledge can be difficult. Depending on the country and location of the food grower, they may have access to professional services providing help. For many professional growers crop insurance mitigates the losses. Further, in many countries, extension services provide critical support dealing with the threat of pests. But in many places, access to these services is limited, or completely absent. In recent years, the increasing accessibility of the Internet in almost any place on the globe has offered some hope that such services can be provided to everyone. In the next section we outline our approach to these diverse challenges by leveraging the computing power of mobile devices that are now ubiquitous across the globe. PlantVillage: a tool for crop health Three years ago, we co-founded an online platform dedicated to plant health and diseases, called PlantVillage (available at www.plantvillage.org). This platform was modeled after popular online platforms in the computer programming domain, including Stack Overflow www.stackoverflow.com, a community driven forum where anyone can ask and answer questions related to programming. The web is full of Q&A sites and these evolved considerably over the last 15 years from simple usernet sites to massive portals like Stack Overflow that attract over 45 million people each month (QuantCast 2015 ). In many communities the questions and answers are then subject to up-voting and down-voting by the community. By providing many answers that are being up-voted, users can build an online reputation, captured by a numeric score. The higher the score, the more rights a user gets on the platform. For example, users need a certain score to be able to down-vote other contributions. An even higher score is needed in order to be able to edit other people's contributions. This model, which has worked very well in many different contexts, has also been successful in PlantVillage, and the platform has seen its traffic grow 250% year over year. In the fall of 2015, we expect to welcome the 2 millionth visitor to the site. In addition to this crowd sourced problem-solving, we have also created a library of open access information on over 150 crops and over 1,800 diseases, accessible on the same website. This content has been written by plant pathology experts, reflecting information sourced from the scientific literature. However, as the site is targeted directly to food growers, rather the professional plant pathologists, great care has been taken to write the content in a way that is easy to understand. Currently, most content is written in English, but we have recently begun to translate it into French, Spanish, and Portuguese. We will continue to translate more content into more languages. While human-assisted disease diagnosis is a powerful tool, we believe that the potential for machine-assisted disease diagnosis has enormous potential. Disease phenotyping, when done by humans, usually involves a visual analysis of the presentation of the disease on the plant. For some visual phenotypes, disease identification by visual cues is straightforward; for others, it may be more challenging. Nevertheless, the visual diagnosis, if possible, so far requires humans. However, despite the challenge of crop health on an increasingly crowded planet (food security), investment in training plant pathologists has not grown correspondingly, and often even decreased (Flood 2010). If a visual diagnostic (by a human) is possible, then computational tools should, in principle, be able to support the human diagnostician. In many cases, a computational diagnostic tool would indeed be the only way to get a diagnosis, due to the absence of expert help in many parts of the world. Even where human diagnostic expertise is available, scaling it to match global demand is not trivial. Since the 1980’s the UN FAO has promoted Farmer Field Schools which focused on improving crop health in developing world countries (Braun et al. 2000). More recently the Plantwise Clinic efforts of CABI have undertaken similar efforts (Nicholls 2015). While both are excellent they are not scalable without the sort of investment seen in developed world countries 150 years ago (Jones and Garforth 2005). This implies that a computational system that could aid with disease diagnosis, either alone or as support, would be both enormously beneficial and inherently scalable if provided online. Such a system would need the ability to recognize a disease from an image, and would thus be an image recognition system based on artificial intelligence. Recent developments in software and leveraging power of groups In recent years, remarkable progress has been made towards the goal of developing artificial intelligence (Ghahramani 2015). Fueled by breakthroughs in machine learning algorithm development, cheap computing, and cheap storage of very large data sets, artificial intelligence has permeated our everyday digital experiences. Whether it is a product recommendation based on our past consumer history, the automatic detection of a friend's name based on an photo uploaded to a social media service, or the automatic language detection and translation of webpages the accuracy of such tools (i.e. recommendation, visual recognition, translation) is now so high that consumers are adopting them very rapidly. Such services are not just at play in consumer arenas but are also being used in medical settings such high throughput screening of radiographs to detect signs of cancer (Wang and Summers 2012). Machine learning is a computational way of detecting patterns in a given dataset in order to make inferences in another, similar dataset. A classical textbook example is the machine recognition of handwriting such as postal addresses on envelopes. In recent years, generic object recognition has made tremendous advances, and is now approaching human accuracy. In facial recognition for example, the DeepFace algorithm developed by Facebook researchers has achieved an accuracy that matches humans (Taigman et al. 2014). These developments have been fueled by a variety of advances, but the most striking breakthroughs have come from the field of neuronal networks, and convolutional neural networks in particular (Krizhevsky et al. 2012). Additionally, advances in computing chips and notably GPUs (Graphical Processing Units) which can be linked together to form networks of computers has also been a key innovation. The development of these algorithmic breakthroughs has come from three main sources (LeCun et al. 2015). The first is traditional academic research at universities and other institutes of higher education, where machine learning and related fields are increasingly important domains not just within computer science, but many other fields as well. The second source is industry, in particular some of the key players of the digital economy such as Facebook, Amazon, and Google. Google research, for example, lists over 450 scholarly papers that it has published in Artificial Intelligence and Machine Learning alone . Facebook and others are catching up quickly, especially as they manage to attract the world's top researchers, given their financial resources and privileged access to data sets. Finally, the third source is perhaps the most surprising the crowd. 3 http://research.google.com/pubs/ArtificialIntelligenceandMachineLearning.html Crowdsourcing is an increasingly common practice of soliciting services from large groups of people online. It has been popularized by services such as Amazon Turk, where researchers and companies can ask large groups of people to provide some tasks or services in exchange for money (Paolacci et al. 2010). Traditionally, these crowdsourced tasks have been simple for humans to do (such as assessing the sentiment of short texts), but increasingly, crowdsourcing is used as a method to find solutions to very hard problems as well. One of the most famous recent examples was the 2009 Netflix prize, where Netflix offered $1 million to anyone who was able to improve their recommendation algorithm by 10%. It was awarded to a team of engineers who published the algorithm openly after the competition (in line with the competition rules). There are now numerous data science competition platforms, where researchers or companies can run competitions on their datasets, usually in exchange for monetary prices, or academic recognition. Another, more vertically focused example of crowdsourcing is ImageNet. The "ImageNet Large Scale Visual Recognition Challenge" has, in just 5 years, become the benchmark in visual object category classification. The challenge started in 2010 at Stanford University and has since attracted participants from over 50 institutions. Many of the key breakthroughs in visual image classification has come through participants in this challenge (Russakovsky et al. 2014). PlantVillage Images In order to leverage the potential of crowdsourcing algorithmic development, we are today releasing a data set of tens of thousands of images of healthy and diseased plants, labeled by plant pathology experts. Over 50,000 images of these are now stored on www.PlantVillage.org, openly accessible and released under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), with the clarification that algorithms trained on the data fall under the same license. We chose this license to ensure that any disease diagnostic algorithm developed on the data can be made freely available to anyone. The data set will continue to grow over the coming months and years. The list of crops is given in Table 1. Examples of the images and different phenotypes is given in Figure 1. All the images in the PlantVillage database were taken at experimental research stations associated with Land Grant Universities in the USA (Penn State, Florida State, Cornell, and others). We are continuing to collect images, and in the future, the list of sources we draw from will increase. Experimental research stations (both public and private) offer the possibility of taking many images in a reduced amount of time. The majority of the images were taken by two technicians working as a team. From field trials of crops infected with one disease, the technicians would collect leaves by removing them from the plant. The leaves were then placed against a paper sheet that provided a grey or black background. All images were taken outside under full light. The light could be strong sun or cloud and we intentionally sought a range of conditions as the end user (grower with a smartphone) will ultimately take images under a range of conditions. For each leaf, we typically took 4-7 images with a standard point and shoot camera using the automatic mode is a standard digital camera (Sony DSC Rx100/13 20.2 megapixels). The leaf was rotated around 360 degrees as we imaged. We found this was important as, depending on both the reflectance and the nature of the disease, multiple images allowed us to capture more data. For crops such as corn (Zea mays) and squash (Cucurbita spp.), the leaves were too large to capture in a single frame while retaining high resolution, close proximity views. In these cases, we took images of different sections of the same leaf. Once the images were collected, they were edited by cropping away much of the background and orientating all leaves so that they tip-pointed upwards. Table: List of crops and their diseases currently (November 24 2015) in the PlantVillage database (www.plantvillage.org Fungi Bacteria Mold Virus Mite Apple Gymnosporangiu m juniperivirginianae

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Classification of Chest Radiology Images in Order to Identify Patients with COVID-19 Using Deep Learning Techniques

Background and Aim: Due to the important role of radiological images for identifying patients with COVID-19, creating a model based on deep learning methods was the main objective of this study. Materials and Methods: 15,153 available chest images of normal, COVID-19, and pneumonia individuals which were in the Kaggle data repository was used as dataset of this research. Data preprocessing inc...

متن کامل

An Approach towards Promoting Iranian Caregivers’ Knowledge on Early Childhood Development

Background: According to the World Health Organization (WHO), parents need to be informed about Early Childhood Development (ECD). Different methods of parents’ education include group-based, face-to-face, book, booklet, web-based, technology-based, and mobile learning using laptops, tablets, and cell phones. Paying attention to caregivers' attitudes is the first step to their education. The ob...

متن کامل

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...

متن کامل

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

An Approach to Management of Health Care and Medical Diagnosis Using of a Hybrid Disease Diagnosis System

Introduction: In order to simplify the information exchange within the medical diagnosis process, a collaborative software agent’s framework is presented. The purpose of the framework is to allow the automated information exchange between different medicine specialists. Methods: This study presented architecture of a hybrid disease diagnosis system. The architecture employed a learning...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.08060  شماره 

صفحات  -

تاریخ انتشار 2015